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ABSTRACT — The number of studies based on herbarium data for analyzing biogeographical 
patterns and environmental questions is increasing, as herbaria are making their collections 
available online. However, the quality of a specimen’s spatial data still varies dramatically 
among records. Most historical specimen records either lack geographic information or 
have only vague textual descriptions about the locality, while contemporary records may 
exhibit unwarranted variation in spatial data quality, requiring increased awareness among 
mycologists about the importance of high quality primary spatial data for specimens. 
Georeferencing is the process of assigning geographic coordinates to a record linking it to 
a geographic location on Earth, and it can be processed retrospectively for records without 
geographical coordinates based on locality descriptions or directly collected in the field 
using GPS handheld units. Here we provide an overview of methods for georeferencing 
historical data retrospectively, discuss practical recommendations for collecting high quality 
spatial data for fungal specimens, and suggest decimal degrees as a standard form for citing 
geographic coordinates. 
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Introduction 

Fungal taxonomy aims to investigate the diversity of fungi on Earth, 
assigning names, and proposing phylogenetic relationships among species. 
Global estimates for fungi vary dramatically depending on the methods 
used (Blackwell 2011, Scheffers et al. 2012), ranging from 611,000 (Mora et 
al. 2011) up to 9.9 million species (Cannon 1997). Presently, there are almost 
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100,000 known fungal species (Kirk et al. 2008), which surely represent only 
a small fraction of extant species. Specimens harbored in herbaria represent 
occurrence records of a taxon at a given location on a specific date and provide 
a fundamental reference to morphological and molecular studies that are 
necessary to ascribe species names (e.g., Brock et al. 2009, Osmundson et al. 
2013). They allow the delimitation and revaluation of a species identity when 
more taxonomic and molecular knowledge becomes available (e.g., Cabral et al. 
2012). Herbaria throughout the world are increasingly making their collections 
available online, and consequently the number of studies based on herbarium 
data for analyzing biogeographical patterns and environmental questions is 
also increasing (Lavoie 2013). However, most biological specimen records 
either lack geographic coordinates or have only imprecise textual descriptions 
about the locality (Guralnick et al. 2006), severely limiting the comprehension 
of species distributions. Public databases containing fungal specimen data exist 
at global (e.g., Global Biodiversity Information Facility - GBIF, www.gbif.org/) 
and regional scales (e.g., speciesLink network, www.splink.org.br/), but the use 
of this data is still incipient among mycologists. 

In the case of fungi, herbarium data have been explored to investigate 
biogeographical patterns (Wu & Mueller 1997, Mueller et al. 2001, Oda et al. 
2004, Wollan et al. 2008, Geml et al. 2012, Wolfe et al. 2012) and climate change 
effects on sporocarp phenology (Kauserud et al. 2008, 2010, 2012), and to model 
the potential distribution of invaders (Wolfe et al. 2010, Wolfe & Pringle 2012). 
However, as we improve our knowledge about fungi, the quality of spatial data 
still varies dramatically among records. As expected, most historical records 
have vague textual descriptions of localities, and most labels lack geographical 
coordinates. Nevertheless, such unwarranted variation in spatial data quality 
can also be found in contemporary records, requiring an increased awareness 
by mycologists. 

Halme et al. (2012) discussed the need to rethink data collection, database 
structure, and organization, stressing the importance for standardizing data 
collection practices. However, they did not emphasize the importance of spatial 
data quality for documenting species occurrence. The scarcity and variability of 
data related to geographic coordinates has severe implications for biodiversity 
and related life science research, including taxonomy, phylogenetics, ecology, 
biogeography, biological monitoring (Halme et al. 2012), and conservation 
planning (Dahlberg & Mueller 2011, Molina et al. 2011). Without specimen 
occurrence information, one cannot make inferences about spatial processes 
that influence the delimitation and distribution of species. Geographic 
Information Systems (GIS), which were designed to manage and analyze spatial 
information, can be used to integrate fungal databases with other variables such 
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as land cover, protected areas, and climatic layers of present, past, and future 
scenarios. The development of ecological niche models (Peterson et al. 2011), 
which use known occurrence points to estimate potential species distribution 
based on the ecological niche to define suitable sites, is another promising 
avenue for mycological research. These models are an important biological tool 
for taxonomy, ecology, evolution, conservation, epidemiology, invasive species 
management, and protected area planning, as all research fields depend on 
accurate and precise coordinates for specimens. 

Georeferencing is the process of assigning geographic coordinates to a 
record, linking it to a geographic location on Earth (Chapman & Wieczorek 
2006). Legacy specimens, which usually have only textual information regarding 
their locality, and recent collections made without geographic coordinates may 
be georeferenced retrospectively (Murphey et al. 2004), increasing the quality 
of specimen occurrence records. On the other hand, contemporary collectors 
could improve the accuracy and precision of their geographic coordinates by 
using Global Positioning System (GPS) handheld units. 

As mycologists may not be well trained in georeferencing, we shall 
discuss how to improve georeferencing quality for fungal specimens. Our 
initial motivation came from practical experience, as most fungal specimens 
now available from the speciesLink network (http://splink.cria.org.br/) — a 
Brazilian initiative that integrates primary biodiversity data for the Brazilian 
Virtual Herbarium of Plants and Fungi (http://inct.florabrasil.net/) — lack 
geographical coordinates. For instance, Schizophyllum commune Fr., a widely 
distributed and quite abundant species, has 410 records in speciesLink network, 
but only 27 (~ 6.5%) could be selected to generate an ecological niche model. 
Record selection in this case took into account a number of quality criteria 
(Giovanni et al. 2012), but it is important to note that most records (85%!) were 
discarded because of the absolute lack of geographical coordinates. Due to the 
few records selected and their geographical bias, the generated model omits 
vast regions in the study area (Braga-Neto 2013a). Similarly, of the 594 records 
for Pycnoporus sanguineus (L.) Murrill available in speciesLink, only 47 (~ 8%) 
were selected (Braga-Neto 2013b). Therefore, so as to increase awareness about 
the importance of spatial data for fungal specimens, we provide an overview 
of methods to georeference historical data retrospectively, discuss practical 
recommendations for collecting high quality spatial data, and suggest a standard 
form for citing geographic coordinates. 


Accuracy, precision, coordinate system and datum 
Understanding the distinction between accuracy and precision is crucial, 
both for retrospective georeferencing of locality data and documenting 
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localities (Murphey et al. 2004). Accuracy refers to how close a measurement 
of a quantity corresponds to its true value (whereas precision is the degree to 
which repeated measurements show the same results). Accuracy depends on 
how data is collected and processed. At the time of collection, accuracy refers to 
the quality of the location originally reported by the specimen collector, which 
may be accurately or inaccurately described, regardless the level of geographic 
detail recorded. In georeferencing, accuracy is related to correctly positioning 
the locality points based on available information and correctly entering the 
data into a spreadsheet or database. PRECISION, in the context of georeferencing, 
refers to the geographic extent potentially represented by the locality 
(Fics. 1-3). It is a product of the original locality description or measurement 
and the georeferencing method applied. A georeference can be accurate but not 
precise, precise but not accurate, neither, or both. A description of a locality 
containing only information about the county may be accurate, but because a 
county is a large geographic area, it is relatively imprecise. On the other hand, a 
locality described retrospectively by latitude and longitude coordinates may be 
precise because only a small geographic area is involved, but inaccurate if the 
georeferencer assigned the values erroneously. Furthermore, a georeference can 
be inaccurate and imprecise if the coordinate measurement in the field contains 
a systematic error and the datum was specified incorrectly. The georeferencing 
process was designed to assign accurate and precise locality data to specimen 
records, but as all measurement involves some kind of error, it is very important 
to document uncertainties as they can determine the suitability of data for 
particular analyses (Rocchini et al. 2011). 


Also important is the geographic COORDINATE SYSTEM, which enables every 
location on Earth to be specified unambiguously. The most common global 
systems are Latitude/Longitude (preferentially expressed in decimal degrees) 
and the Universal Transverse Mercator (UTM), which uses a metric-based 
Cartesian grid to locate positions on the Earths surface. The UTM system is 
not a single map projection but a series of map projections (known as zones), 
one for each of sixty 6-degree bands of longitude. On the other hand, the earth's 
surface is not perfectly round, so it is necessary to correct these undulations. The 
DATUM is a mathematical model of the size and shape of the earth, and of the 
origin and orientation of coordinate systems. Coordinates without a horizontal 
datum do not uniquely specify a location, so failure to record the correct datum 
associated with coordinates can result in positional errors of hundreds to 
thousands of meters on a global scale (Wieczorek et al. 2004). Given its impact 
on precision, it is essential to provide information about the datum used as an 
essential part of the coordinate description. Datum shifts must be taken into 
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FIGURES 1-3. Geographical projections of estimated errors as a function of the number of decimal 
places in the decimal degree format (Wieczorek et al. 2004). The most precise coordinate was used 
as a reference point around which errors were depicted as buffer zones; the darker the buffer the 
more precise. A specimen recorded originally with 5 decimal places (e.g., Latitude -22.33214, 
Longitude -48.87189) has very precise spatial data, because it includes an error radius of only 1.51 
m estimated for this latitude. 1. If the coordinate chosen as an example did not contain any decimal 
place (e.g., Latitude -22, Longitude -49) the error dramatically would increase 100,000-fold. This 
error is depicted as the light gray 151 km radius buffer within São Paulo State, Brazil. 2. Close-up of 
Sao Paulo State showing the same buffer and a smaller and darker one that represents the 15.1 km 
expected error if the coordinate contained only one decimal place (e.g., Latitude -22.3, Longitude 
-48.9). 3. A closer view shows that a coordinate with two decimal places (e.g., Latitude -22.33, 
Longitude -48.87) is expected to embody an error of 1.51 km, which decreases to 150 m if the 
coordinate included three decimal places (e.g., Latitude -22.332, Longitude —48.872). The dashed 
polygon in Fic. 3 represents a Protected Area where the original point was obtained. 


account when comparing data of different datum or re-projecting data to avoid 
the inclusion of errors. The most common global horizontal datum is WGS84 
(World Geodetic Survey 1984), but there are some regional datums frequently 
used, such as NAD83 in North America, and SAD69 in South America. 
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Retrospective georeferencing of specimen records 

Most herbaria that house fungal specimens collected over the past centuries 
currently face the task of georeferencing a vast amount of historical records, 
inevitably complicated by imprecise text descriptions, inconsistent formatting, 
misspellings, old names that have changed, different languages, and even 
contradictory information about collection sites. Traditionally, herbarium 
specimen labels have included several levels of text information specifying 
the site where the collection was made (such as country, county, city, and/or 
a reference to a place or geographic feature using some measure of distance 
and direction) but rarely including geographic coordinates. Essentially, 
retrospective georeferencing is a hypothesis that interprets quantitatively 
a locality description based on best available geographic information, along 
with associated uncertainty (Wieczorek et al. 2004). It attempts to define a 
standardized process by minimizing subjectivity, but it is a time consuming 
process (Murphey et al. 2004, Guralnick & Hill 2009, Hill et al. 2009), especially 
if carried out on a specimen-by-specimen basis (Guralnick et al. 2006). 

There are some methods being developed to georeference locality 
descriptions objectively, some of which can be widely implemented even if 
GIS expertise is lacking (Chapman & Wieczorek 2006). Basically, the methods 
classify the locality descriptions, determine coordinates and extents, calculate 
uncertainties, and document the georeferencing process. The Wieczorek et al. 
(2004) point-radius method provides a relatively easy, practical solution for 
georeferencing localities and estimating uncertainties. It takes into account 
aspects of the precision and specificity of the site description, as well as the 
map scale, datum, precision, and accuracy of the sources used to determine 
coordinates. Each locality is described as a circle, with a point marking the 
position most closely described by the site description and a radius describing 
the maximum distance from that point within which the site is expected to 
occur. However, the Wieczorek et al. (2004) method tends to overestimate the 
uncertainty, since it is essentially additive and does not consider the probability 
distribution for each uncertainty source (Guo et al. 2008); some other methods 
have developed more complex estimates of uncertainties as probability surfaces 
(Guo et al. 2008, Liu et al. 2009). 

Along with setting appropriate methods, there is a general concern for 
developing processing tools to increase the rate for georeferencing locations by 
focusing on automated methods and batch processing (Guralnick et al. 2006, 
Hill et al. 2009). These initiatives apply documented data standards and provide 
essential information about the data processing steps to ensure the process is 
replicable and may be improved upon, as the methods are still being developed 
(Hill et al. 2009). Even so, Wieczorek et al. (2004) recommend checking 
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automated results to ensure that they were interpreted correctly. Development 
of these integrative tools will reduce redundancy of effort by returning assigned 
coordinates directly to the data curators, increasing efficiency along the process 
(Guralnick et al. 2006, Hill et al. 2009). 

Depending on the data volume to be processed and the world region, another 
promising and increasingly adopted approach to retrospective georeferencing 
is based on Google Earth© (Garcia-Milagros & Funk 2010, Naparus & Kuntner 
2012, Weirauch et al. 2012). This convenient and freely available popular 
software offers a friendly interface that easily allows information visualization, 
accepts data imported from different sources and formats (including decimal 
degrees), includes uncertainty and datum, overlays maps, creates paths, points, 
and polygons, measures linear distances or paths, checks point elevations, 
manages points and associates them with notes and photos, shares data, and 
exports to other software. However, the time needed to georeference each 
specimen may be considerable (Garcia-Milagros & Funk 2010). 


Georeferencing specimen records today 

Collecting data in the field is the first step towards good georeferencing 
procedures (Chapman & Wieczorek 2006). The most important improvement 
in georeferencing practices stems from the Global Positioning System (GPS), a 
satellite-based navigation system which determines location points anywhere 
on Earth and in all weather conditions and is freely accessible to anyone with 
a GPS receiver. Given the high accuracy and precision of GPS devices in 
recording locality data, we strongly recommend GPS use in the field. All GPS 
devices provide latitude and longitude, and some also calculate altitude. Since 
the end of the ‘Selective Availability’ period on 1 May 2000, the accuracy of 
hand-held GPS units improved from more than 100 m (McElroy 1998) to less 
than 1 m. 

The most useful devices for biological surveys are hand-held GPS units, 
which are relatively cheap, small, and effective. Most modern units have a 
colorful screen that displays map features and allows fast handling, along 
with 12 parallel channels for acquiring GPS satellite signals faster and more 
accurately. It is best to choose waterproof or at least water-resistant models 
with replaceable batteries for collecting specimens in the field. Currently, GPS 
units do not have basemaps by default, but it is possible to transfer maps and 
increase their usefulness. For instance, the BirdsEye Satellite Imagery service 
offers high-resolution color satellite imagery of surrounding environments 
that at an additional cost can be uploaded to Garmin handheld GPS units. The 
highest resolution images can help mycologists navigate, find trails, avoid edges 
in forest fragments, and estimate spatial coverage, among several potential 
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applications. Halme et al. (2012) demonstrated how to estimate sampling effort 
by recording tracks produced by GPS during an inventory of wood-inhabiting 
fungi in Finland. All these data require memory space, so devices with a USB 
connection are highly recommended. 

However, GPS data should not replace text descriptions of localities, which 
are essential for validating the points. In contrast to legacy records, modern 
collectors have the advantage of being directly responsible for specimen 
data quality and can use many techniques and tools to ensure accurately 
georeferenced locations. Hence, site descriptions should be as specific, 
unambiguous, complete, and accurate as possible (Chapman & Wieczorek 
2006). Collectors should avoid using vague terms such as ‘near, ‘along’ and 
‘center of’ without providing an offset distance estimate. To reduce uncertainty, 
localities to be used as reference points should be permanent sites covering 
small areas. If the site can only be tracked down by distances measured along a 
path, road, or river, it is important to be accurate. 

If no GPS device was available in the field, coordinates should be assigned 
as soon after collection as possible from online maps (e.g., Google Earth) or 
gazetteers (e.g., Alexandria Digital Library, Fuzzy Gazetteer). Some tools are 
available globally (e.g., Geonames; www.geonames.org/), while others focus on 
particular regions (e.g., GeoLoc, a web service for finding localities in Brazil; 
http://splink.cria.org.br/geoloc). Various online maps carry more or less detailed 
text information, such as forest district names that are often cited by collectors. Such 
maps are available at the national level (e.g., France: www.geoportail.gouv.fr/accueil; 
Australia: www.aus-emaps.com/topo.php) or sub-national level (e.g., Bavaria, 
Germany: http://geoportal.bayern.de/bayernatlas/). Other useful tools are 
GEOLocate (www.museum.tulane.edu/geolocate/), a platform dedicated to 
georeferencing natural history collections data, and MaNIS Georeferencing 
Calculator [(http://manisnet.org/gc.html], mostly helpful in calculating 
offset distances and errors. It is important to assign the coordinates while the 
information is still fresh, so you can easily and accurately locate the geographical 
references. 


How to improve accuracy and precision of GPS data 

GPS accuracy depends mostly on the type of GPS unit used, the number of 
satellites visible to your receiver, the strength of satellite signals, the geometric 
positioning of the satellites in the sky, and atmospheric conditions. There is 
always a degree of uncertainty that should be associated with any coordinate. A 
GPS unit uses four or more satellites to calculate your latitude, longitude, and 
altitude on Earth. However, handheld GPS altitude measurements are usually 
inaccurate, so the most reliable measurements are horizontal. Maximizing 
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FIGURES 4-5. The precision of geographic coordinates depends on the number of decimal places 
in the decimal degrees format. 4. A geographic coordinate expressed in decimal degrees lacking 
decimal places is expected to embody an error of hundreds of kilometers. However, errors vary with 
Latitude; regions at the Equator are expected to include up to 35% more error than the ones near 
the Poles. 5. The magnitude of the error, expressed in meters in a semi log scale, is negatively related 
to the number of decimal places. The error is expected to decrease 10-fold for every increased 
decimal place, so we recommend recording the most complete available information. The inclusion 
of five decimal places in all measurements ensures enough precision for most applications. 


signal strength and the number of satellites available when measuring will 
produce the best horizontal accuracies. Ensure that your device reads the signal 
of at least four satellites, but if possible wait for more satellites to be detected. 
Most GPS units express signal strength and horizontal accuracy, so wait a few 
minutes or move to an area with better signal reception, taking note of the 
distance and direction from the original point. GPS satellite signal may be 
impaired by solid objects, such as mountains and dense forest canopy, causing 
errors and even no reading at all. If you are working in a closed canopy area, it 
is useful to turn on the GPS while still in an open area. 

It is essential to configure the GPS coordinate system and datum in advance. 
We recommend Latitude/Longitude in decimal degrees as the coordinate 
system, and the datum WGS84 as a standard, as it is geocentric and globally 
consistent. Since this configuration is followed by most systems, it will also 
facilitate data exchange. A measurement in decimal degrees to five decimal 
places is recommended (Fics. 4-5). Most GPS devices do not directly record 
accuracy with the waypoint data but provide it in the interface showing current 
satellite conditions. As the importance of estimating uncertainties also applies 
to GPS-derived data, we recommend noting the accuracy of each point. To 
forestall problems, it is a good practice to record GPS coordinates and associated 
data (decimal latitude, decimal longitude, and accuracy) in a notebook. Most 
importantly, document your data with proper metadata. Metadata is data about 
data, describing the data with information about who, what, when, where, why, 
and how; including the model of the device used is important. 
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Standard form for geographical coordinates 

One of the most critical georeferencing steps (and a major source of error) 
is data entry. Reduce errors by adopting good practices. The most convenient 
format for storing and managing geographic coordinates is DECIMAL DEGREES 
(Wieczorek et al. 2004), which relies on just two attributes (decimal latitude, 
decimal longitude) and does not include extra symbols, minimizing the 
chances of transcription errors. Using different variations of the unit symbols 
in the classical degrees, minutes, and seconds input format may produce 
errors of more than 1 km (Hans Otto Baral, pers. comm.). Decimal latitude is 
expressed by positive values to the North and negative values to the South of 
the Equator, varying from 0 to 90 to the North Pole, and 0 to —90 to the South 
Pole. Decimal longitude is expressed by positive values to the East and negative 
values to the West of the Greenwich Meridian, varying from 0 to 180 in the East 
and from 0 to —180 in the West. If possible, record the coordinates in decimal 
degrees to five decimal places, as insufficiently precise coordinates can result in 
uncertainties (FIGs. 1-3). Error is expected to increase 10-fold every decimal 
place lost (Fic. 5), so we recommend to record the most complete available 
information. It is essential to provide information about the datum used (e.g., 
Latitude: —26.92856, Longitude: —49.04896, Datum: WGS84). There are online 
tools that can help convert different formats to decimal degrees, as Converter 
(http://splink.cria.org.br/conversor), an open access web-based service 
developed by the Reference Center on Environmental Information (CRIA) that 
converts different types of geographic coordinates and datums commonly used 
in Brazil, and GeoCalc (www.geocomp.com.au/geocalc/), a free software that 
converts coordinate data files between commonly-used mapping systems in the 
world. 


Final comments 

Providing high quality geographic coordinates is an essential step in creating 
species distribution maps, either by simply plotting the occurrence points 
on the map or by modelling and projecting the ecological niche to depict its 
potential distribution (Peterson et al. 2011). The adoption of simple practices 
guarantees the collection of high quality primary spatial data for fungal 
specimens. Data quality could be increased even further if herbarium curators 
and editors of taxonomic, ecological, and conservation journals emphasized 
the importance of providing geographic data for all specimens. Currently, most 
fungal taxonomic journals do not require geographic coordinates as essential 
information of specimens examined, leaving the responsibility of providing 
locality data to collectors and herbarium curators. We encourage researchers 
to obtain high quality data in the field and editors and curators to encourage 
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geographic coordinates for specimens, greatly improving data availability and 
maximizing their usefulness. 
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